Construction of Diachronic Ontologies from People's Daily of Fifty Years

نویسندگان

  • Shaoda He
  • Xiaojun Zou
  • Liumingjing Xiao
  • Junfeng Hu
چکیده

This paper presents an Ontology Learning From Text (OLFT) method follows the well-known OLFT cake layer framework. Based on the distributional similarity, the proposed method generates multi-level ontologies from comparatively small corpora with the aid of HITS algorithm. Currently, this method covers terms extraction, synonyms recognition, concepts discovery and concepts hierarchical clustering. Among them, both concepts discovery and concepts hierarchical clustering are aided by the HITS authority, which is obtained from the HITS algorithm by an iteratively recommended way. With this method, a set of diachronic ontologies is constructed for each year based on People’s Daily corpora of fifty years (i.e., from 1947 to 1996). Preliminary experiments show that our algorithm outperforms the Google’s RNN and K-means based algorithm in both concepts discovery and concepts hierarchical clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

طراحی سامانه هوشمند ساخت هستان نگار به کمک شبکه عصبی ARTو روشC-value

In recent years, many efforts have been done to design ontology learning methods and automate ontology construction process. The ontology construction process is a time-consuming and costly procedure for almost all domains/applications, so automating this process is a solution to overcome the knowledge acquisition bottleneck in information systems and reduce the construction cost. In this artic...

متن کامل

Gearing the Discursive Practice to the Evolution of Discipline: Diachronic Corpus Analysis of Stance Markers in Research Articles’ Methodology Section

Despite widespread interest and research among applied linguists to explore metadiscourse use, very little is known of how metadiscourse resources have evolved over time in response to the historically developing practices of academic communities. Motivated by such an ambition, the current research drew on a corpus of 874315 words taken from three leading journals of applied linguistics in orde...

متن کامل

Investigating Lexico-grammaticality in Academic Abstracts and Their Full Research Papers from a Diachronic Perspective

Development of science and academic knowledge has led to changes in academic language and transfer of information and knowledge. In this regard, the present study is an attempt to investigate lexico-grammaticality in academic abstracts and their full research papers in Linguistics, Chemistry and Electrical engineering papers published during 1991-2015 in academic journals from a diachronic pers...

متن کامل

بررسی هستان شناسی های توسعه یافته مبتنی بر اصول هستان شناسی های منبع باز زیست پزشکی

Background and Aim: Ontologies facilitate data integration, exchange, searching and querying. Open Biomedical Ontologies (OBO) Foundry is a solution for creating reference ontologies. In this foundry, the design of ontologies is based on established principles which allow for their interactions as a single system. The purpose of this study is to determine the main features of ontologies develop...

متن کامل

Multiple Tokenizations in a Diachronic Corpus

This paper deals with the construction of a maximally flexible corpus architecture for building and analyzing diachronic corpora. Historical data poses many challenges with regard to representation and analysis, and diachronic corpora are even more varied and unsystematic (Claridge, 2008). Since historical and diachronic corpora are so difficult and expensive to build, it is crucial that they b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014